Skip to content

Conversation

@pymia
Copy link

@pymia pymia commented Sep 23, 2025

Implements SageMaker Serverless Inference endpoints as requested in issue #23148.

  • Add ServerlessProductionVariantProps interface with maxConcurrency, memorySizeInMB, and provisionedConcurrency
  • Extend EndpointConfig to support serverless variants alongside existing instance variants
  • Add comprehensive validation for serverless configuration parameters
  • Enforce mutual exclusivity between instance and serverless variants
  • Add CloudFormation template generation for ServerlessConfig properties
  • Include extensive test coverage for validation scenarios and error cases

Issue # 23148

Closes #23148.

Reason for this change

AWS SageMaker Serverless Inference is not supported in the CDK SageMaker L2 constructs. Users can only configure instance-based endpoints, missing the serverless option for intermittent/unpredictable traffic patterns that could benefit from cost-effective serverless inference.

This feature was explicitly planned in the original SageMaker Endpoint L2 construct RFC with Instance-prefixed classes designed to make room for Serverless-prefixed analogs.

Description of changes

Implements AWS SageMaker Serverless Inference support in CDK SageMaker L2 constructs, enabling cost-effective serverless endpoints for intermittent workloads:

  • New ServerlessProductionVariantProps interface extending ProductionVariantProps with AWS-compliant serverless properties:
    • maxConcurrency: 1-200 range (required)
    • memorySizeInMB: 1024-6144MB in 1GB increments (required)
    • provisionedConcurrency: 1-200 range, optional, must be ≤ maxConcurrency
  • New addServerlessProductionVariant() method with comprehensive input validation
  • Extended EndpointConfigProps with optional serverlessProductionVariant property
  • Mutual exclusivity enforcement between instance and serverless variants per AWS constraints
  • Single serverless variant limit per endpoint configuration (AWS limitation)
  • Comprehensive synthesis-time validation with clear, actionable error messages
  • CloudFormation integration leveraging existing L1 construct ServerlessConfig support

Usage Example:

import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';

declare const model: sagemaker.IModel;

// Create serverless endpoint configuration
const endpointConfig = new sagemaker.EndpointConfig(this, 'ServerlessEndpointConfig', {
  serverlessProductionVariant: {
    model: model,
    variantName: 'serverlessVariant',
    maxConcurrency: 10,
    memorySizeInMB: 2048,
    provisionedConcurrency: 5, // optional
  },
});

Describe any new or updated permissions being added

N/A - No new IAM permissions required. Leverages existing SageMaker model and endpoint permissions.

Description of how you validated changes

  • Unit tests: Added 12 comprehensive serverless variant tests covering all validation scenarios:

    • Memory size validation (1024-6144MB in 1GB increments)
    • Concurrency range validation (1-200 for both max and provisioned)
    • Mutual exclusivity enforcement between instance and serverless variants
    • Single serverless variant limit per AWS constraints
    • Cross-environment model compatibility validation
    • Error condition testing with clear error messages
    • CloudFormation template generation verification
  • Integration tests: Extended existing integration test with serverless endpoint configuration, verified CloudFormation template generation with correct ServerlessConfig properties:

    ServerlessEndpointConfig:
      Type: AWS::SageMaker::EndpointConfig
      Properties:
        ProductionVariants:
          - ServerlessConfig:
              MaxConcurrency: 10
              MemorySizeInMB: 2048
              ProvisionedConcurrency: 5
            VariantName: serverlessVariant
  • Comprehensive testing results: 63/63 unit tests pass (100% success rate), 4/4 integration tests pass, no regressions detected across 16,024+ CDK tests

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions bot added effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p1 beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK labels Sep 23, 2025
@aws-cdk-automation aws-cdk-automation requested a review from a team September 23, 2025 08:29
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This review is outdated)

@pymia pymia force-pushed the feature/sagemaker-serverless-variants-23148 branch 2 times, most recently from aad0c97 to 78ef21c Compare September 23, 2025 13:31
@pahud pahud marked this pull request as draft September 23, 2025 14:23
@pahud pahud self-assigned this Sep 23, 2025
@pahud
Copy link
Contributor

pahud commented Sep 23, 2025

taking a look.

@pahud
Copy link
Contributor

pahud commented Sep 23, 2025

❌ Features must contain a change to a README file.
❌ Features must contain a change to an integration test file and the resulting snapshot.

As this is a new feat we need

  1. update README with very focusd and minimal description.
  2. add new intet test or refresh existing relevant integ tests and update snapshots

@aws-cdk-automation aws-cdk-automation dismissed their stale review September 23, 2025 15:05

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.

@pymia pymia marked this pull request as ready for review September 23, 2025 15:37
@pahud pahud marked this pull request as draft September 23, 2025 15:37
@pymia pymia force-pushed the feature/sagemaker-serverless-variants-23148 branch from 04fc444 to 5ff7875 Compare September 24, 2025 15:31
@pymia pymia force-pushed the feature/sagemaker-serverless-variants-23148 branch 4 times, most recently from 2ab372b to d8a868d Compare September 29, 2025 14:58
@pahud pahud removed their assignment Sep 29, 2025
@pahud pahud marked this pull request as ready for review September 29, 2025 16:29
@abidhasan-aws abidhasan-aws self-requested a review September 30, 2025 12:52
@abidhasan-aws abidhasan-aws self-assigned this Sep 30, 2025
@abidhasan-aws abidhasan-aws removed their request for review September 30, 2025 13:40
@abidhasan-aws abidhasan-aws removed their assignment Sep 30, 2025
@abidhasan-aws abidhasan-aws self-requested a review September 30, 2025 14:49
@abidhasan-aws abidhasan-aws self-assigned this Sep 30, 2025
Copy link
Contributor

@abidhasan-aws abidhasan-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pymia,
Thanks for your contribution. I have left some comments :)


// Validate mutual exclusivity
if (props.instanceProductionVariants && props.serverlessProductionVariant) {
throw new Error('Cannot specify both instanceProductionVariants and serverlessProductionVariant. Choose one variant type.');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to find any documentation that says instanceProductVariant and serverlessProductVariant cannot be used simultaneously for a single endpoint. Could you please provide the source that refers to this restriction?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instance based deployment and serverless deployment should not exist at the same time.
Reference: Amazon SageMaker Deploy Model, and AWS::SageMaker::EndpointConfig ProductionVariant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CDK has one README per service. SageMaker-alpha already has a README, so we don't need to create a new one.
We can add the documentation related to this PR in packages/@aws-cdk/aws-sagemaker-alpha/README.md.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved with the latest commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep the integration test in one file. We have another integration test file integ.endpoint-config. We can put all the necessary integration-test related code in that file and remove this one.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved with the latest commit.

pymia pushed a commit to pymia/aws-cdk that referenced this pull request Oct 14, 2025
- Remove standalone readme_serverless_section.md file
- Remove enhanced_integ_test.ts file
- Consolidate serverless tests into existing integ.endpoint-config.ts
- Add comprehensive serverless test cases (minimal, full, boundary values)
- Maintain existing documentation in main SageMaker README
- Keep mutual exclusivity validation with AWS docs justification

Addresses review comments in PR aws#35557
@mergify mergify bot dismissed abidhasan-aws’s stale review October 14, 2025 12:49

Pull request has been modified.

pymia and others added 4 commits October 15, 2025 16:41
Implements SageMaker Serverless Inference endpoints as requested in issue aws#23148.

- Add ServerlessProductionVariantProps interface with maxConcurrency, memorySizeInMB, and provisionedConcurrency
- Extend EndpointConfig to support serverless variants alongside existing instance variants
- Add comprehensive validation for serverless configuration parameters
- Enforce mutual exclusivity between instance and serverless variants
- Add CloudFormation template generation for ServerlessConfig properties
- Include extensive test coverage for validation scenarios and error cases

Closes aws#23148
…less inference

- Add comprehensive serverless inference documentation to SageMaker alpha README
- Update integration test with serverless endpoint configuration examples
- Include verification comments for both instance-based and serverless endpoints
- Generate CloudFormation snapshots with proper ServerlessConfig properties

Addresses reviewer feedback requiring README documentation and integration test coverage for the new serverless inference feature.
…ch AWS specs

- Update maxConcurrency validation range from 1-200 to 1-1000

- Update provisionedConcurrency validation range from 1-200 to 1-1000

- Fix memory size documentation from 3008MB to 3072MB in requirements

- Add comprehensive test coverage for upper bound validation

- Update TypeScript definitions and JSDoc comments

This aligns the implementation with AWS SageMaker serverless endpoint specifications and RFC 431 requirements for L2 constructs.
- Remove standalone readme_serverless_section.md file
- Remove enhanced_integ_test.ts file
- Consolidate serverless tests into existing integ.endpoint-config.ts
- Add comprehensive serverless test cases (minimal, full, boundary values)
- Maintain existing documentation in main SageMaker README
- Keep mutual exclusivity validation with AWS docs justification

Addresses review comments in PR aws#35557
@pymia pymia force-pushed the feature/sagemaker-serverless-variants-23148 branch from 4a225e0 to 83c635e Compare October 15, 2025 14:41
@abidhasan-aws abidhasan-aws added the pr/needs-integration-tests-deployment Requires the PR to deploy the integration test snapshots. label Oct 16, 2025
Copy link
Contributor

@abidhasan-aws abidhasan-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes earlier. I have left a few more comments.

Thanks. :)

}

// validate instance variant limits
if (hasInstanceVariants && this._instanceProductionVariants.length > 10) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think hasInstanceVariants is redundant here, if (this._instanceProductionVariants.length > 10) should be enough

### Serverless Inference

Amazon SageMaker Serverless Inference is a purpose-built inference option that makes it easy for you to deploy and scale ML models. Serverless endpoints automatically launch compute resources and scale them in and out depending on traffic, eliminating the need to choose instance types or manage scaling policies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add the link to the doc for further reference:
SageMaker ServerLess Inference

* Render the serverless production variant.
*/
private renderServerlessProductionVariant(): CfnEndpointConfig.ProductionVariantProperty[] {
if (!this.serverlessProductionVariant) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should throw an error in this case. The design is to only call renderServerlessProductionVariant when serverlessProductionVariant is defined. Therefore, if serverlessProductionVariant is not defined, it should be treated as an error.

*/
private renderInstanceProductionVariants(): CfnEndpointConfig.ProductionVariantProperty[] {
this.validateProductionVariants();
return this._instanceProductionVariants.map( v => ({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a validation here. If the instanceProductionVariant is empty we can throw an error.

},
});

new IntegTest(app, 'integtest-endpointconfig', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure the resources were deployed as intended, we can perform assertions using API calls.

We can do the assertion like this:

const integ = new IntegTest(app, 'integtest-endpointconfig', {
  testCases: [stack],
});

// Verify instance-based endpoint config
integ.assertions.awsApiCall('SageMaker', 'describeEndpointConfig', {
  EndpointConfigName: endpointConfig.endpointConfigName,
}).expect(ExpectedResult.objectLike({
  ProductionVariants: [
    { VariantName: 'firstVariant', InstanceType: 'ml.m5.large' },
    { VariantName: 'secondVariant' },
    { VariantName: 'thirdVariant' },
  ],
}));

// Verify serverless endpoint config
integ.assertions.awsApiCall('SageMaker', 'describeEndpointConfig', {
  EndpointConfigName: serverlessEndpointConfig.endpointConfigName,
}).expect(ExpectedResult.objectLike({
  ProductionVariants: [{
    VariantName: 'serverlessVariant',
    ServerlessConfig: {
      MaxConcurrency: 10,
      MemorySizeInMB: 2048,
      ProvisionedConcurrency: 5,
    },
  }],
}));

*
* The above command will result in the following output.
*
* For instance-based endpoint config, the above command will result in the following output:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are adding the assertions, we can remove this commented part. We are verifying what is created in the stack by the API call anyway.

@abidhasan-aws abidhasan-aws added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed pr/needs-integration-tests-deployment Requires the PR to deploy the integration test snapshots. labels Oct 17, 2025
@github-actions
Copy link
Contributor

This PR has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Oct 17, 2025
@abidhasan-aws abidhasan-aws added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Oct 22, 2025
@github-actions github-actions bot added closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Oct 22, 2025
@github-actions github-actions bot closed this Oct 22, 2025
@abidhasan-aws abidhasan-aws reopened this Oct 28, 2025
@abidhasan-aws abidhasan-aws removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Oct 28, 2025
@github-actions
Copy link
Contributor

This issue has been reopened and is now available for discussion.

@abidhasan-aws abidhasan-aws removed the closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. label Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sagemaker: Support serverless variants for endpoints

4 participants